The goal of this project is to make use of a model-based approach to recognize a picture of a human’s facial emotion as different deep learning architectures allow the automatic extraction of features and classification.
FER2013: It is publicly available online and it has 32298 pictures with dimensions of 48x48. This dataset contains a grayscale, labeled set of images with a consistent format. Facial emotions are highlighted by centering it within each frame. The images are all of uniform dimension.
KDEF: There are 4900 pictures of 562x762 pixels. The pictures included in this dataset includes only 70 individuals between 20 and 30 years, where each individual is displaying 7 different expressions, and each expression is taken from 5 different angles.
For FER2013 datasets
For KDEF datasets
The model that has demonstrated the greatest accuracy in facial emotion image recognition is the Deep-Emotion model using the FER2013, CK+ and FERG databases for 7 classes: Angry, Disgust, Fear, Happy, Sad, Surprised and Neutral. Below is Emotion recognition classification accuracies for the FER2013, CK+ and FERG datasets of the proposed model in comparison to other models without having any extra training data, which showed that it has competed, and outperformed other expression recognition models .
The Facial recognition system proposed uses the pre-trained deep CNN model VGG-16 with KDEF and JAFFE datasets to classify 1000 image objects. The first CNN layer captures some simple features such as the edges and corners of the image, which is followed by another CNN layer that detects more complex features such as shapes, whereas the upper layer follows the same method to learn more complex features. The pre-trained model is modified for emotion recognition, redefining the dense layers, and then fine-tuning is performed with emotion data. The last dense layers of the 14 pre-trained model are replaced with the new dense layers to recognize a facial image into one of 7 emotion classes (afraid, angry, disgusted, sad, happy, surprised, and neutral). The fine-tuning is performed on the architecture having the convolution base of the pre-trained model plus the added dense layers. Data Preprocessing includes resizing, cropping and more tasks that are used to train in fine-tuning.
2 convolution blocks (convolution layer + Pooling layer), followed by a convolution layer and added a fully connected layer to classify 7 classes
We used 50 epochs instead of 30 epochs and added a dropout layer before the last layer
Same experiment as update 2, but we did it on the KDEF dataset instead of the FER2013 dataset
Added a dropout layer to the previous experiment reduce
Freezing first 3 layers and unfreezing the 4th & 5th layers
Train the Full VGG16 Network by unfreezeing all the layers and train the full model
UnFreezing starting from the conv2d_260 CNN layer until the end of the pretrained model
Froze unit until res5a_branch2a layer then unfroze
Froze first 3 layers and unfroze last 2 layers with augmentation and changed Optimizer to SGD with learning rate of 5*10^-3, Momentum = 0.8, decay=0.0005
Average of 3 models:
Weighted Average of 3 models:
Categorical Training accuracy is 96.4%, while Categorical Validation accuracy is 55.9%, caused high overfitting
Greater categorical training accuracy of 97.1% and Validation accuracy of 59.3% and it also caused overfitting
Higher Accuracy than one in paper
Validation accuracy decreased
Accuracy: 97.53%
Accuracy: 0.1405791019723038%
Accuracy: 99.56%
65.37% Training accuracy
Validation Accuracy of 87.47%
Validation Accuracy 54.28%
Validation Accuracy 97.2%
Here you will detail the details related to training, for example:
-Performing Image Augmentation has improved the validation accuracy but in expense lowered slightly the training accuracy
-Overfitting occurs when training over the FER2013 data even when performing regularization techniques, but training over the KDEF dataset performed better and produced higher validation accuracy with very minor overfitting.
-Pre-trained VGG16 and pre-trained InceptionV3 with fine-tuning layers produced good performances on both validation and training data, but ResNet performed poorly and underfits on the training data.
-For future work we can deploy the model real-time and further fine tune the parameters. We can also experiment with more datasets and find a way to handle different obstacles with illumination and so on.
List all references here, the following are only examples